exponential graph
Communication-Efficient Topologies for Decentralized Learning with O(1) Consensus Rate
Decentralized optimization is an emerging paradigm in distributed learning in which agents achieve network-wide solutions by peer-to-peer communication without the central server. Since communication tends to be slower than computation, when each agent communicates with only a few neighboring agents per iteration, they can complete iterations faster than with more agents or a central server. However, the total number of iterations to reach a network-wide solution is affected by the speed at which the agents' information is "mixed" by communication. We found that popular communication topologies either have large maximum degrees (such as stars and complete graphs) or are ineffective at mixing information (such as rings and grids). To address this problem, we propose a new family of topologies, EquiTopo, which has an (almost) constant degree and a network-size-independent consensus rate that is used to measure the mixing efficiency. In the proposed family, EquiStatic has a degree of ฮ(ln(n)), where nis the network size, and a series of time-dependent one-peer topologies, EquiDyn, has a constant degree of 1. We generate EquiDyn through a certain random sampling procedure. Both of them achieve an n-independent consensus rate. We apply them to decentralized SGD and decentralized gradient tracking and obtain faster communication and better convergence, theoretically and empirically.
74e1ed8b55ea44fd7dbb685c412568a4-Supplemental.pdf
Thisboundisattainedif nisanevennumber, ฮปn/2 isthatdesiredeigenvalue.Basedonthenumerical experiment, we know it ifn is an odd number, this bound cannot be attained. The ring topology is undirected, and is illustrated in Figure 1(a). The star topology is undirected, and is illustrated in Figure 1(b). Its weight matrix is generated according totheMetropolis rule,which issymmetric. The 2D-grid topology is undirected, and is illustrated in Figure 1(c).
Exponential Graph is Provably Efficient for Decentralized Deep Training
Decentralized SGD is an emerging training method for deep learning known for its much less (thus faster) communication per iteration, which relaxes the averaging step in parallel SGD to inexact averaging. The less exact the averaging is, however, the more the total iterations the training needs to take. Therefore, the key to making decentralized SGD efficient is to realize nearly-exact averaging using little communication. This requires a skillful choice of communication topology, which is an under-studied topic in decentralized optimization.In this paper, we study so-called exponential graphs where every node is connected to $O(\log(n))$ neighbors and $n$ is the total number of nodes. This work proves such graphs can lead to both fast communication and effective averaging simultaneously. We also discover that a sequence of $\log(n)$ one-peer exponential graphs, in which each node communicates to one single neighbor per iteration, can together achieve exact averaging. This favorable property enables one-peer exponential graph to average as effective as its static counterpart but communicates more efficiently. We apply these exponential graphs in decentralized (momentum) SGD to obtain the state-of-the-art balance between per-iteration communication and iteration complexity among all commonly-used topologies. Experimental results on a variety of tasks and models demonstrate that decentralized (momentum) SGD over exponential graphs promises both fast and high-quality training.
A Static Exponential Graph
Illustration of the 6 -node static exponential graph and its associated weight matrix. Transform (DFT) and its connection to circulant matrix, which plays the critical role in the proof. Use the conjugate argument then apply the similar procedure as i = 1 . The shape of the 6-node topologies discussed in Sec. The ring topology is undirected, and is illustrated in Figure 1(a).
DICE: Data Influence Cascade in Decentralized Learning
Zhu, Tongtian, Li, Wenhao, Wang, Can, He, Fengxiang
Decentralized learning offers a promising approach to crowdsource data consumptions and computational workloads across geographically distributed compute interconnected through peer-to-peer networks, accommodating the exponentially increasing demands. However, proper incentives are still in absence, considerably discouraging participation. Our vision is that a fair incentive mechanism relies on fair attribution of contributions to participating nodes, which faces non-trivial challenges arising from the localized connections making influence ``cascade'' in a decentralized network. To overcome this, we design the first method to estimate \textbf{D}ata \textbf{I}nfluence \textbf{C}ascad\textbf{E} (DICE) in a decentralized environment. Theoretically, the framework derives tractable approximations of influence cascade over arbitrary neighbor hops, suggesting the influence cascade is determined by an interplay of data, communication topology, and the curvature of loss landscape. DICE also lays the foundations for applications including selecting suitable collaborators and identifying malicious behaviors. Project page is available at https://raiden-zhu.github.io/blog/2025/DICE/.